[Kernel] Integrate CUTLASS MoE kernel with PPLX #18762

ElizaWszola · 2025-05-27T12:35:22Z

Integrate CUTLASS MoE fp8 kernels with PPLX.

Unit tests:

tests/kernels/moe/test_pplx_cutlass_moe.py

E2E testing:

export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29500
export VLLM_ALL2ALL_BACKEND=pplx
python3 examples/offline_inference/data_parallel.py \
        --model="nm-testing/DeepSeek-Coder-V2-Lite-Instruct-FP8" \
        --dp-size=2 \
        --tp-size=1 \
        --trust-remote-code

Signed-off-by: ElizaWszola <ewszola@redhat.com>

github-actions · 2025-05-27T12:35:33Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: ElizaWszola <ewszola@redhat.com>

…d benchmarks Signed-off-by: ElizaWszola <ewszola@redhat.com>

Signed-off-by: ElizaWszola <ewszola@redhat.com>

tests/kernels/moe/test_pplx_cutlass_moe.py

vllm/model_executor/layers/fused_moe/layer.py

vllm/model_executor/layers/fused_moe/pplx_prepare_finalize.py

tlrmchlsmth

Left a few comments, but looks good overall -- lets try to get it landed once those and Bill's comments are addressed!

csrc/quantization/cutlass_w8a8/moe/moe_data.cu

csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu

tests/kernels/moe/test_pplx_cutlass_moe.py

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

mergify · 2025-06-03T19:31:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ElizaWszola.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: ElizaWszola <ewszola@redhat.com>

vllm/model_executor/layers/fused_moe/cutlass_moe.py

Signed-off-by: ElizaWszola <ewszola@redhat.com>

mergify · 2025-06-05T16:50:13Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ElizaWszola.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

Signed-off-by: ElizaWszola <ewszola@redhat.com>

…#18762)" This reverts commit 84166fe. Signed-off-by: Bill Nell <bnell@redhat.com>

bennorris123 · 2025-08-27T13:19:49Z

Hi @ElizaWszola - Is the pplx kernel already in the vllm image by default now? Or do we still need to go through the additional installation steps of expert parallelism (https://github.com/vllm-project/vllm/tree/main/tools/ep_kernels). Thanks!

ElizaWszola added 7 commits May 19, 2025 13:39

Cutlass MoE pplx - working unit tests

888177a

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Working e2e, but there are some hacks and it needs cleaning

6486345

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Set the correct workspace shapes, padded and unpadded c1,c2,c3

5d4751f

Signed-off-by: ElizaWszola <ewszola@redhat.com>

format

9cb7802

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Merge branch 'main' into cutlass-moe-pplx-integration

f34d6b1

Signed-off-by: ElizaWszola <ewszola@redhat.com>

uncomment quant_method selection

9499f74

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Working e2e after merge

df1a014

Signed-off-by: ElizaWszola <ewszola@redhat.com>

ElizaWszola changed the title ~~[Kernel] Integrate CUTLASS MoE kernel with PPLX~~ [WIP][Kernel] Integrate CUTLASS MoE kernel with PPLX May 27, 2025

ElizaWszola added 3 commits May 27, 2025 15:01

Nuke output map codepath, clean up a bit

503a9b3

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Fix the non-pplx codepath

8c1d57b

Signed-off-by: ElizaWszola <ewszola@redhat.com>

CUDA kernel for pplx data computation, cleanups, fixing unit tests an…

268bbea

…d benchmarks Signed-off-by: ElizaWszola <ewszola@redhat.com>

mergify bot added the ci/build label May 28, 2025

ElizaWszola added 3 commits May 28, 2025 14:13

Various cleanups

a6236bf

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Better types and attribute checks

ab46919

Signed-off-by: ElizaWszola <ewszola@redhat.com>

Missing return, type check ignore

3e436fd

Signed-off-by: ElizaWszola <ewszola@redhat.com>

ElizaWszola changed the title ~~[WIP][Kernel] Integrate CUTLASS MoE kernel with PPLX~~ [Kernel] Integrate CUTLASS MoE kernel with PPLX May 28, 2025

ElizaWszola marked this pull request as ready for review May 28, 2025 16:10

ElizaWszola requested review from WoosukKwon, mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners May 28, 2025 16:10